[SPARK-16006][SQL] Attemping to write empty DataFrame with no fields throws non-intuitive exception by dongjoon-hyun · Pull Request #13730 · apache/spark

dongjoon-hyun · 2016-06-17T05:31:29Z

What changes were proposed in this pull request?

This PR allows emptyDataFrame.write since the user didn't specify any partition columns.

Before

scala> spark.emptyDataFrame.write.parquet("/tmp/t1")
org.apache.spark.sql.AnalysisException: Cannot use all columns for partition columns;
scala> spark.emptyDataFrame.write.csv("/tmp/t1")
org.apache.spark.sql.AnalysisException: Cannot use all columns for partition columns;

After this PR, there occurs no exceptions and the created directory has only one file, _SUCCESS, as expected.

How was this patch tested?

Pass the Jenkins tests including updated test cases.

dongjoon-hyun · 2016-06-17T05:35:51Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala

Since validatePartitionColumn is used for only writing-related classes, I added this description for clarification.
We can update the function description if the usage pattern is changed in the future.

dongjoon-hyun · 2016-06-17T05:36:18Z

Hi, @tdas .
This is the PR to care the reported corner case.

SparkQA · 2016-06-17T07:05:51Z

Test build #60686 has finished for PR 13730 at commit da1017e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2016-06-17T08:45:03Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala

Its weird that a check like this is in the PartitioningUtils. This check seems nothing to do with partitioning, as its basically certain file formats that do not support writing a DF with no columns. Is there somewhere earlier where you can check this?

Thank you for review, @tdas.
Yes. Indeed. This is beyond of the scope of PartitioningUtils.
Actually, this logic is used 3 classes, PreWriteCheck., DataSource, FileStreamSinkWriter
I'll try to move this.

dongjoon-hyun · 2016-06-17T11:19:12Z

Hi, @tdas .
At first look, I thought this corner case needs to throw exceptions.
But, after considering more carefully, I want to allow emptyDataFrame.write without exceptions.
That is more natural way to handle this corner case because the user didn't give any partition columns.
How do you think about this?

(Anyway, I will update this PR for further discussion)

SparkQA · 2016-06-17T13:01:51Z

Test build #60704 has finished for PR 13730 at commit 5f1db7f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2016-06-17T21:35:43Z

Hi, @tdas .
Could you review this PR again when you have some time?

dongjoon-hyun · 2016-06-18T06:49:12Z

Hi, @rxin .
Could you review this PR ?

dongjoon-hyun · 2016-06-18T07:01:00Z

Oh, sorry. The master was changed.

dongjoon-hyun · 2016-06-18T07:01:09Z

I will recheck this PR again.

dongjoon-hyun · 2016-06-18T07:05:43Z

Yep. The case still exists for parquet/csv and I updated the cases.
The previous text case changes like the following and looks legitimate.

scala> spark.emptyDataFrame.write.text("/tmp/t1")
org.apache.spark.sql.AnalysisException: Text data source supports only a single column, and you have 0 columns.;

dongjoon-hyun · 2016-06-20T21:14:52Z

Hi, @tdas .
Could you review this PR when you have some time?
Current master still has this issue.

SparkQA · 2016-06-22T08:01:01Z

Test build #61014 has finished for PR 13730 at commit fd0b1cf.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2016-06-23T10:14:36Z

Hi, @tdas .
Could you give me some advice for the direction about how to change this PR?

dongjoon-hyun · 2016-06-24T16:18:25Z

Rebased.

SparkQA · 2016-06-24T17:56:46Z

Test build #61183 has finished for PR 13730 at commit 7bb64d7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-06-26T07:19:57Z

Test build #61254 has finished for PR 13730 at commit ba9a529.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2016-06-27T18:04:01Z

Ping @tdas

…throw non-intuitive exception

SparkQA · 2016-06-28T17:08:23Z

Test build #61391 has started for PR 13730 at commit 7d38003.

dongjoon-hyun · 2016-06-28T18:21:49Z

Retest this please.

SparkQA · 2016-06-28T20:07:08Z

Test build #61397 has finished for PR 13730 at commit 7d38003.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2016-06-28T23:15:36Z

Hi, @tdas .
Could you give me some advice on this issue?

rxin · 2016-06-29T22:00:14Z

Thanks - merging in master/2.0.

…throws non-intuitive exception ## What changes were proposed in this pull request? This PR allows `emptyDataFrame.write` since the user didn't specify any partition columns. **Before** ```scala scala> spark.emptyDataFrame.write.parquet("/tmp/t1") org.apache.spark.sql.AnalysisException: Cannot use all columns for partition columns; scala> spark.emptyDataFrame.write.csv("/tmp/t1") org.apache.spark.sql.AnalysisException: Cannot use all columns for partition columns; ``` After this PR, there occurs no exceptions and the created directory has only one file, `_SUCCESS`, as expected. ## How was this patch tested? Pass the Jenkins tests including updated test cases. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #13730 from dongjoon-hyun/SPARK-16006. (cherry picked from commit 9b1b3ae) Signed-off-by: Reynold Xin <rxin@databricks.com>

dongjoon-hyun · 2016-06-29T22:24:11Z

Thank you for merging, @rxin .

dongjoon-hyun reviewed Jun 17, 2016
View reviewed changes

tdas reviewed Jun 17, 2016
View reviewed changes

dongjoon-hyun mentioned this pull request Jun 17, 2016

[SPARK-15743][SQL] Prevent saving with all-column partitioning #13486

Closed

dongjoon-hyun added 2 commits June 28, 2016 10:03

[SPARK-16006][SQL] Attemping to write empty DataFrame with no fields …

c4458d4

…throw non-intuitive exception

Allow emptyDataFrame.write.

7d38003

dongjoon-hyun mentioned this pull request Jun 29, 2016

[SPARK-16289][SQL] Implement posexplode table generating function #13971

Closed

asfgit closed this in 9b1b3ae Jun 29, 2016

dongjoon-hyun deleted the SPARK-16006 branch July 20, 2016 07:41

Conversation

dongjoon-hyun commented Jun 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

dongjoon-hyun Jun 17, 2016

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jun 17, 2016

Uh oh!

SparkQA commented Jun 17, 2016

Uh oh!

tdas Jun 17, 2016

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jun 17, 2016

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jun 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Jun 17, 2016

Uh oh!

dongjoon-hyun commented Jun 17, 2016

Uh oh!

dongjoon-hyun commented Jun 18, 2016

Uh oh!

dongjoon-hyun commented Jun 18, 2016

Uh oh!

dongjoon-hyun commented Jun 18, 2016

Uh oh!

dongjoon-hyun commented Jun 18, 2016

Uh oh!

dongjoon-hyun commented Jun 20, 2016

Uh oh!

SparkQA commented Jun 22, 2016

Uh oh!

dongjoon-hyun commented Jun 23, 2016

Uh oh!

dongjoon-hyun commented Jun 24, 2016

Uh oh!

SparkQA commented Jun 24, 2016

Uh oh!

SparkQA commented Jun 26, 2016

Uh oh!

dongjoon-hyun commented Jun 27, 2016

Uh oh!

SparkQA commented Jun 28, 2016

Uh oh!

dongjoon-hyun commented Jun 28, 2016

Uh oh!

SparkQA commented Jun 28, 2016

Uh oh!

dongjoon-hyun commented Jun 28, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rxin commented Jun 29, 2016

Uh oh!

dongjoon-hyun commented Jun 29, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dongjoon-hyun commented Jun 17, 2016 •

edited

Loading

dongjoon-hyun commented Jun 17, 2016 •

edited

Loading

dongjoon-hyun commented Jun 28, 2016 •

edited

Loading